Note: Increased IOPub

For visualization, if you did not start this notebook with an increased data rate limit, stop the notebook, go back to the command line, and start Jupyter Notebook using the following command.

jupyter notebook --NotebookApp.iopub_data_rate_limit=10000000000

Import the Python API module and Instantiate the GIS object

Import the Python API


In [1]:
import arcgis
from getpass import getpass
import pandas as pd


---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-d5c8cc5010e9> in <module>()
----> 1 import arcgis
      2 from getpass import getpass
      3 import pandas as pd

ImportError: No module named 'arcgis'

Create an GIS object instance using the account currently logged in through ArcGIS Pro


In [2]:
# gis_retail = arcgis.gis.GIS('Pro')  # optional, use if can use ArcGIS Pro authentication
gis = arcgis.gis.GIS(
    username='jmccune_retail', 
    password=getpass('Please enter the headless password: ')
)


Please enter the headless password: ········

Get a Data Frame, data to work with, from the Web GIS Item ID

Create a Web GIS Item instance using the Item ID


In [3]:
trade_area_itemid = '44c0b666404c41b8a14e04c34d09be1e'
item = arcgis.gis.Item(gis=gis, itemid=trade_area_itemid)
item


Out[3]:
targetWithDemographics
All same stores with huge variety of demographics associated with the locations.Feature Layer Collection by joel5174@esri.com_commteamretail
Last Modified: August 15, 2017
0 comments, 15 views

Since the item only contains one feature layer, get the first layer in the item, the Feature Layer we need to work with.


In [4]:
feature_layer = item.layers[0]
feature_layer


Out[4]:
<FeatureLayer url:"https://services.arcgis.com/PMTtzuTB6WiPuNSv/arcgis/rest/services/targetWithDemographics/FeatureServer/0">

Use query to return the data as a Feature Set.


In [5]:
feature_set = feature_layer.query()

In [6]:
df_fields = pd.DataFrame([(field['name'], field['alias']) for field in feature_set.fields], columns=['NAME', 'ALIAS'])
df_fields


Out[6]:
NAME ALIAS
0 OBJECTID Object ID
1 Target_Locations_LOCNUM LOCNUM
2 Target_Locations_CONAME CONAME
3 Target_Locations_STREET STREET
4 Target_Locations_CITY CITY
5 Target_Locations_STATE STATE
6 Target_Locations_ZIP ZIP
7 Target_Locations_ZIP4 ZIP4
8 Target_Locations_NAICS NAICS
9 Target_Locations_SIC SIC
10 Target_Locations_SALESVOL SALESVOL
11 Target_Locations_SQFTCODE SQFTCODE
12 overlayTemplate_TOTPOP_CY 2016 Total Population
13 overlayTemplate_HHPOP_CY 2016 Population in Households
14 overlayTemplate_FAMPOP_CY 2016 Population in Families
15 overlayTemplate_GQPOP_CY 2016 Population in Group Quarters
16 overlayTemplate_POPDENS_CY 2016 Population Density
17 overlayTemplate_TOTHH_CY 2016 Total Households
18 overlayTemplate_AVGHHSZ_CY 2016 Average Household Size
19 overlayTemplate_FAMHH_CY 2016 Family Households
20 overlayTemplate_AVGFMSZ_CY 2016 Average Family Size
21 overlayTemplate_TOTHU_CY 2016 Total Housing Units
22 overlayTemplate_OWNER_CY 2016 Owner Occupied HUs
23 overlayTemplate_RENTER_CY 2016 Renter Occupied HUs
24 overlayTemplate_VACANT_CY 2016 Vacant Housing Units
25 overlayTemplate_POPGRW10CY 2010-2016 Growth Rate: Population
26 overlayTemplate_HHGRW10CY 2010-2016 Growth Rate: Households
27 overlayTemplate_FAMGRW10CY 2010-2016 Growth Rate: Families
28 overlayTemplate_NOHS_CY 2016 Education: < 9th Grade
29 overlayTemplate_SOMEHS_CY 2016 Education: High School/No Diploma
... ... ...
130 overlayTemplate_INDEDUC_CY 2016 Industry: Educational Services
131 overlayTemplate_INDHLTH_CY 2016 Industry: Health Care
132 overlayTemplate_INDARTS_CY 2016 Industry: Arts/Entertainment/Rec
133 overlayTemplate_INDFOOD_CY 2016 Industry: Accommodation/Food Svcs
134 overlayTemplate_INDOTSV_CY 2016 Industry: Other Services
135 overlayTemplate_INDPUBL_CY 2016 Industry: Public Administration
136 overlayTemplate_UNEMP_CY 2016 Unemployed Population 16+
137 overlayTemplate_UNEMPRT_CY 2016 Unemployment Rate
138 overlayTemplate_OCCMGMT_CY 2016 Occupation: Management
139 overlayTemplate_OCCBUS_CY 2016 Occupation: Business/Financial
140 overlayTemplate_OCCCOMP_CY 2016 Occupation: Computer/Mathmatical
141 overlayTemplate_OCCARCH_CY 2016 Occupation: Architecture/Engineer
142 overlayTemplate_OCCSSCI_CY 2016 Occupation: Life/Social Sciences
143 overlayTemplate_OCCSSRV_CY 2016 Occupation: Social Service
144 overlayTemplate_OCCLEGL_CY 2016 Occupation: Legal
145 overlayTemplate_OCCEDUC_CY 2016 Occupation: Education/Library
146 overlayTemplate_OCCENT_CY 2016 Occupation: Arts/Entertainment/Rec
147 overlayTemplate_OCCHTCH_CY 2016 Occupation: Health Practices
148 overlayTemplate_OCCHLTH_CY 2016 Occupation: Health Support
149 overlayTemplate_OCCPROT_CY 2016 Occupation: Protective Service
150 overlayTemplate_OCCFOOD_CY 2016 Occupation: Food Preperation
151 overlayTemplate_OCCBLDG_CY 2016 Occupation: Building Maintenance
152 overlayTemplate_OCCPERS_CY 2016 Occupation: Personal Care
153 overlayTemplate_OCCSALE_CY 2016 Occupation: Sales
154 overlayTemplate_OCCADMN_CY 2016 Occupation: Office/Admin
155 overlayTemplate_OCCFARM_CY 2016 Occupation: Farm/Fish/Forestry
156 overlayTemplate_OCCCONS_CY 2016 Occupation: Construction/Extraction
157 overlayTemplate_OCCMAIN_CY 2016 Occupation: Maintenance/Repair
158 overlayTemplate_OCCPROD_CY 2016 Occupation: Production
159 overlayTemplate_OCCTRAN_CY 2016 Occupation: Transportation/Moving

160 rows × 2 columns

Take advantage of the df function on the Feature set object returned from the query to convert the data to a Pandas Data Frame.


In [7]:
df = feature_set.df
df.head()


Out[7]:
OBJECTID Target_Locations_CITY Target_Locations_CONAME Target_Locations_LOCNUM Target_Locations_NAICS Target_Locations_SALESVOL Target_Locations_SIC Target_Locations_SQFTCODE Target_Locations_STATE Target_Locations_STREET ... overlayTemplate_VAL200K_CY overlayTemplate_VAL250K_CY overlayTemplate_VAL300K_CY overlayTemplate_VAL400K_CY overlayTemplate_VAL500K_CY overlayTemplate_VAL50K_CY overlayTemplate_VAL750K_CY overlayTemplate_WHITE_CY overlayTemplate_WIDOWED_CY SHAPE
0 1 SEASIDE TARGET 666990510 45211101 35495 531102 D CA CALIFORNIA AVE ... 200 354 934 693 1976 81 1012 23576 1902 {'x': -121.84299999999993, 'y': 36.62100000000...
1 2 MARINA TARGET 653371815 45211101 35495 531102 D CA GENERAL STILLWELL DR ... 137 270 1043 742 1612 81 588 21068 1500 {'x': -121.81120000000001, 'y': 36.66760000000...
2 3 CAPITOLA TARGET 423468472 45211101 35495 531102 D CA 41ST AVE ... 372 405 1015 1271 4018 261 2452 41050 2524 {'x': -121.96509999999991, 'y': 36.9753000000001}
3 4 WATSONVILLE TARGET 511743478 45211101 35495 531102 D CA MAIN ST ... 456 449 870 982 1211 197 293 19905 1323 {'x': -121.77399999999992, 'y': 36.91540000000...
4 5 CUPERTINO TARGET 404459478 45211101 52059 531102 D CA STEVENS CREEK BLVD ... 43 49 211 519 2823 110 7694 41689 5226 {'x': -122.03619999999995, 'y': 37.32310000000...

5 rows × 161 columns

Because I am neurotic, and hate the prefixes added to the field names from the join when preparing the data in ArcMap, we are cleaning them up a little here.


In [8]:
df.columns = [field.replace('Target_Locations_', '').replace('overlayTemplate_', '') for field in df.columns]
df.head()


Out[8]:
OBJECTID CITY CONAME LOCNUM NAICS SALESVOL SIC SQFTCODE STATE STREET ... VAL200K_CY VAL250K_CY VAL300K_CY VAL400K_CY VAL500K_CY VAL50K_CY VAL750K_CY WHITE_CY WIDOWED_CY SHAPE
0 1 SEASIDE TARGET 666990510 45211101 35495 531102 D CA CALIFORNIA AVE ... 200 354 934 693 1976 81 1012 23576 1902 {'x': -121.84299999999993, 'y': 36.62100000000...
1 2 MARINA TARGET 653371815 45211101 35495 531102 D CA GENERAL STILLWELL DR ... 137 270 1043 742 1612 81 588 21068 1500 {'x': -121.81120000000001, 'y': 36.66760000000...
2 3 CAPITOLA TARGET 423468472 45211101 35495 531102 D CA 41ST AVE ... 372 405 1015 1271 4018 261 2452 41050 2524 {'x': -121.96509999999991, 'y': 36.9753000000001}
3 4 WATSONVILLE TARGET 511743478 45211101 35495 531102 D CA MAIN ST ... 456 449 870 982 1211 197 293 19905 1323 {'x': -121.77399999999992, 'y': 36.91540000000...
4 5 CUPERTINO TARGET 404459478 45211101 52059 531102 D CA STEVENS CREEK BLVD ... 43 49 211 519 2823 110 7694 41689 5226 {'x': -122.03619999999995, 'y': 37.32310000000...

5 rows × 161 columns

Store Segmentation Using KMeans Clustering

Use KMeans cluster analysis from the Sci-Kit Learn package to segment the demographically similar stores together.


In [9]:
from sklearn.cluster import KMeans
import pandas as pd
import numpy as np

%matplotlib inline

Create the KMeans clusterer, specifying it to create five clusters, or segments.


In [10]:
km = KMeans(n_clusters=5)

Use the fit_predict method to create the clusters using just the numeric fields - not including the OID and sales fields.


In [11]:
df_independent = df.select_dtypes(include=[np.number]).drop(['OBJECTID', 'SALESVOL'], axis=1)
fields_independent = df_independent.columns  # save for later
field_name_dependent = 'SALESVOL' # keep track of this as well

In [12]:
df['segment'] = km.fit_predict(df_independent)

In [13]:
df.head()


Out[13]:
OBJECTID CITY CONAME LOCNUM NAICS SALESVOL SIC SQFTCODE STATE STREET ... VAL250K_CY VAL300K_CY VAL400K_CY VAL500K_CY VAL50K_CY VAL750K_CY WHITE_CY WIDOWED_CY SHAPE segment
0 1 SEASIDE TARGET 666990510 45211101 35495 531102 D CA CALIFORNIA AVE ... 354 934 693 1976 81 1012 23576 1902 {'x': -121.84299999999993, 'y': 36.62100000000... 0
1 2 MARINA TARGET 653371815 45211101 35495 531102 D CA GENERAL STILLWELL DR ... 270 1043 742 1612 81 588 21068 1500 {'x': -121.81120000000001, 'y': 36.66760000000... 0
2 3 CAPITOLA TARGET 423468472 45211101 35495 531102 D CA 41ST AVE ... 405 1015 1271 4018 261 2452 41050 2524 {'x': -121.96509999999991, 'y': 36.9753000000001} 4
3 4 WATSONVILLE TARGET 511743478 45211101 35495 531102 D CA MAIN ST ... 449 870 982 1211 197 293 19905 1323 {'x': -121.77399999999992, 'y': 36.91540000000... 0
4 5 CUPERTINO TARGET 404459478 45211101 52059 531102 D CA STEVENS CREEK BLVD ... 49 211 519 2823 110 7694 41689 5226 {'x': -122.03619999999995, 'y': 37.32310000000... 3

5 rows × 162 columns

Investigate the Results

Now, take a look at a few descriptive metrics to begin to understand the results - initially by creating a data frame with the segments, mean sales for each segment, median sales for each segment, and the median sales for each segment.


In [14]:
df_count = df.segment.value_counts().reset_index().rename(columns={'index': 'segment', 'segment': 'count'})
df_mean = df.groupby('segment').mean()[field_name_dependent].reset_index().rename(
    columns={'segment': 'segment', field_name_dependent: 'mean_sales'})
df_median = df.groupby('segment').median()[field_name_dependent].reset_index().rename(
    columns={'segment': 'segment', field_name_dependent: 'median_sales'})

In [15]:
df_summary = pd.merge(df_count, df_mean)
df_summary = pd.merge(df_summary, df_median)
df_summary = df_summary.sort_values('mean_sales', ascending=False)
df_summary['mean_sales_zscore'] = df_summary.apply(lambda row: (row.mean_sales - df.SALESVOL.mean()) / df.SALESVOL.std(ddof=0), axis=1)
print('Sales Mean: {:.2f}\nSales Standard Deviation: {:.2f}'.format(df.SALESVOL.mean(), df.SALESVOL.std()))
df_summary


Sales Mean: 35147.10
Sales Standard Deviation: 17891.99
Out[15]:
segment count mean_sales median_sales mean_sales_zscore
4 3 52 45732.500000 42130.5 0.591792
3 1 208 40657.115385 35495.0 0.308045
2 2 360 38065.166667 35495.0 0.163139
1 4 536 34590.748134 33751.0 -0.031103
0 0 652 31391.226994 32268.0 -0.209977

Now, using data from this data frame, visualize the means of each segment against the mean, and one-half standard deviation above and below the mean.


In [18]:
ax = df_summary.sort_values('mean_sales').plot.bar(y='mean_sales', x='segment', figsize=(15, 8))
ax.set_ylabel('mean sales')
ax.set_title('Mean Sales by KMeans Segment')

ax.axhline(y=df.SALESVOL.mean(), color='r') # add the mean for reference
ax.axhline(y=df.SALESVOL.mean() - df.SALESVOL.std() * 0.5, color='g')
ax.axhline(y=df.SALESVOL.mean() + df.SALESVOL.std() * 0.5, color='g')


Out[18]:
<matplotlib.lines.Line2D at 0x111da9438>

In [21]:
df['segment_3'] = df.segment == 3
df


Out[21]:
OBJECTID CITY CONAME LOCNUM NAICS SALESVOL SIC SQFTCODE STATE STREET ... VAL300K_CY VAL400K_CY VAL500K_CY VAL50K_CY VAL750K_CY WHITE_CY WIDOWED_CY SHAPE segment segment_3
0 1 SEASIDE TARGET 666990510 45211101 35495 531102 D CA CALIFORNIA AVE ... 934 693 1976 81 1012 23576 1902 {'x': -121.84299999999993, 'y': 36.62100000000... 0 False
1 2 MARINA TARGET 653371815 45211101 35495 531102 D CA GENERAL STILLWELL DR ... 1043 742 1612 81 588 21068 1500 {'x': -121.81120000000001, 'y': 36.66760000000... 0 False
2 3 CAPITOLA TARGET 423468472 45211101 35495 531102 D CA 41ST AVE ... 1015 1271 4018 261 2452 41050 2524 {'x': -121.96509999999991, 'y': 36.9753000000001} 4 False
3 4 WATSONVILLE TARGET 511743478 45211101 35495 531102 D CA MAIN ST ... 870 982 1211 197 293 19905 1323 {'x': -121.77399999999992, 'y': 36.91540000000... 0 False
4 5 CUPERTINO TARGET 404459478 45211101 52059 531102 D CA STEVENS CREEK BLVD ... 211 519 2823 110 7694 41689 5226 {'x': -122.03619999999995, 'y': 37.32310000000... 3 True
5 6 SAN JOSE TARGET 373128867 45211101 84715 531102 D CA SARATOGA AVE ... 408 1002 7996 136 10636 70131 6289 {'x': -121.99069999999993, 'y': 37.29280000000... 3 True
6 7 SUNNYVALE TARGET 402344537 45211101 35495 531102 D CA W MC KINLEY AVE ... 874 1620 6575 351 7622 54321 4554 {'x': -122.03229999999998, 'y': 37.37370000000... 3 True
7 8 SAN JOSE TARGET 637354200 45211101 35495 531102 D CA BLOSSOM HILL RD ... 2347 4110 16914 847 8664 95636 6032 {'x': -121.86139999999993, 'y': 37.25050000000... 3 True
8 9 SAN JOSE TARGET 435039879 45211101 35495 531102 D CA COTTLE RD ... 2517 4531 12939 862 2895 60248 3932 {'x': -121.80389999999996, 'y': 37.24990000000... 1 False
9 10 SAN JOSE TARGET 230021602 45211101 70990 531102 D CA HILLSDALE AVE ... 994 2572 15838 284 11823 105854 6740 {'x': -121.91809999999997, 'y': 37.26320000000... 3 True
10 11 SANTA CLARA TARGET 706082503 45211101 35495 531102 D CA EL CAMINO REAL ... 1024 2219 11352 143 5843 64144 4600 {'x': -121.96059999999993, 'y': 37.35230000000... 1 False
11 12 SAN JOSE TARGET 427515311 45211101 35495 531102 D CA COLEMAN AVE ... 1530 2638 6295 118 2191 57534 3499 {'x': -121.9045, 'y': 37.339100000000066} 4 False
12 13 SAN JOSE TARGET 653371856 45211101 35495 531102 D CA MONTEREY HWY ... 2668 2699 6629 935 3005 54621 4656 {'x': -121.86229999999993, 'y': 37.30460000000... 2 False
13 14 SAN JOSE TARGET 230020265 45211101 59158 531102 D CA SILVER CREEK RD ... 3609 4905 9288 951 2819 45850 6138 {'x': -121.81259999999997, 'y': 37.30770000000... 1 False
14 15 SAN JOSE TARGET 608450706 45211101 35495 531102 D CA STORY RD ... 4372 3769 6149 528 1173 57430 6657 {'x': -121.84059999999995, 'y': 37.33980000000... 4 False
15 16 SAN JOSE TARGET 224908368 45211101 47327 531102 D CA N CAPITOL AVE ... 3988 4141 9327 303 2652 49144 6969 {'x': -121.84499999999997, 'y': 37.37430000000... 2 False
16 17 PETALUMA TARGET 434053366 45211101 35495 531102 D CA KENILWORTH DR ... 2108 2737 4224 140 1045 40488 2441 {'x': -122.62839999999994, 'y': 38.24430000000... 4 False
17 18 ROHNERT PARK TARGET 493062277 45211101 35495 531102 D CA ROHNERT PARK EXPY W ... 1549 1277 1716 578 420 32034 2041 {'x': -122.72029999999997, 'y': 38.34690000000... 0 False
18 19 SANTA ROSA TARGET 963164819 45211101 40228 531102 D CA SANTA ROSA AVE ... 3734 2363 2468 407 318 59232 3254 {'x': -122.71349999999993, 'y': 38.41910000000... 4 False
19 20 SAN BRUNO TARGET 988513198 45211101 118317 531102 D CA EL CAMINO REAL ... 1276 1917 9866 129 5783 43606 6359 {'x': -122.41899999999995, 'y': 37.63690000000... 1 False
20 21 REDWOOD CITY TARGET 230176208 45211101 47327 531102 D CA EL CAMINO REAL ... 349 596 3232 58 3640 47312 3054 {'x': -122.2177, 'y': 37.4753000000001} 2 False
21 22 FOSTER CITY TARGET 974408171 45211101 47327 531102 C CA BRIDGEPOINTE PKWY ... 539 1035 6319 77 5222 42560 3881 {'x': -122.28379999999994, 'y': 37.56030000000... 1 False
22 23 DALY CITY TARGET 212308704 45211101 106486 531102 D CA SERRAMONTE CTR ... 1865 3843 18845 354 6784 58816 9890 {'x': -122.46900000000001, 'y': 37.66920000000... 3 True
23 24 COLMA TARGET 230019408 45211101 94654 531102 D CA JUNIPERO SERRA BLVD ... 1336 3117 15394 317 5119 46690 8154 {'x': -122.46469999999998, 'y': 37.67330000000... 1 False
24 25 SAN FRANCISCO TARGET 715933937 45211101 35495 531102 D CA OCEAN AVE ... 371 1030 6426 84 5976 36031 4408 {'x': -122.46109999999996, 'y': 37.72560000000... 1 False
25 26 SAN FRANCISCO TARGET 433910516 45211101 35495 531102 D CA GEARY BLVD ... 380 795 3525 114 3686 85223 6066 {'x': -122.44589999999998, 'y': 37.78230000000... 1 False
26 27 SAN FRANCISCO TARGET 424165589 45211101 1893 531102 C CA MISSION ST ... 391 665 3115 35 2083 52153 5903 {'x': -122.40399999999993, 'y': 37.78470000000... 4 False
27 28 SAN FRANCISCO TARGET 707846495 45211101 35495 531102 D CA BUSH ST ... 322 536 2572 54 2079 55605 7392 {'x': -122.4016, 'y': 37.790900000000065} 2 False
28 29 ALBANY TARGET 403717441 45211101 35495 531102 D CA EASTSHORE HWY ... 753 1219 3456 94 1784 22257 1328 {'x': -122.30689999999998, 'y': 37.88360000000... 4 False
29 30 ALAMEDA TARGET 434528941 45211101 35495 531102 D CA 5TH ST ... 636 759 1804 23 852 13043 2001 {'x': -122.28149999999995, 'y': 37.78720000000... 0 False
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1778 1779 WATERTOWN TARGET 404476340 45211101 85608 531102 D MA ARSENAL ST ... 1494 1833 4279 68 996 47510 2297 {'x': -71.15759999999993, 'y': 42.36410000000006} 4 False
1779 1780 BOSTON TARGET 439199889 45211101 32103 531102 D MA BOYLSTON ST ... 1488 1716 3799 45 2029 97063 3458 {'x': -71.09969999999996, 'y': 42.344400000000... 2 False
1780 1781 DORCHESTER TARGET 404478002 45211101 32103 531102 D MA ALLSTATE RD ... 2312 1978 2695 219 582 37260 3366 {'x': -71.06379999999999, 'y': 42.32540000000007} 0 False
1781 1782 SOMERVILLE TARGET 404467681 45211101 42804 531102 D MA SOMERVILLE AVE ... 2909 3249 5113 57 1221 74269 3633 {'x': -71.08999999999995, 'y': 42.37760000000006} 4 False
1782 1783 EVERETT TARGET 580926004 45211101 44944 531102 D MA MYSTIC VIEW RD ... 5182 3673 3085 142 291 67317 4977 {'x': -71.07049999999995, 'y': 42.40160000000008} 4 False
1783 1784 REVERE TARGET 404478004 45211101 2140 531102 D MA FURLONG DR ... 3699 1516 957 208 171 56423 3471 {'x': -71.00139999999996, 'y': 42.40070000000002} 0 False
1784 1785 KINGSTON TARGET 233726918 45211101 32103 531102 D MA KINGSTON COLLECTION WAY ... 1467 927 910 79 246 20030 1091 {'x': -70.70319999999998, 'y': 41.97980000000013} 0 False
1785 1786 LEOMINSTER TARGET 404460140 45211101 1070 531102 C MA ORCHARD HILL PARK DR ... 769 307 193 146 51 13257 699 {'x': -71.71219999999991, 'y': 42.52810000000007} 0 False
1786 1787 LOWELL TARGET 716737994 45211101 32103 531102 D MA PLAIN ST ... 3995 1768 1477 393 129 55463 3993 {'x': -71.31809999999997, 'y': 42.62450000000012} 4 False
1787 1788 NASHUA TARGET 404476441 45211101 45133 531102 D NH AMHERST ST ... 1158 549 342 150 74 14518 519 {'x': -71.53639999999992, 'y': 42.80030000000008} 0 False
1788 1789 NASHUA TARGET 404271499 45211101 56417 531102 D NH DANIEL WEBSTER HWY ... 992 647 394 133 24 14033 913 {'x': -71.44069999999991, 'y': 42.70300000000007} 0 False
1789 1790 BEDFORD TARGET 404476444 45211101 41748 531102 D NH S RIVER RD ... 1427 662 286 167 59 26882 1707 {'x': -71.47189999999995, 'y': 42.942600000000... 4 False
1790 1791 HOOKSETT TARGET 425836948 45211101 33850 531102 D NH QUALITY DR ... 1696 494 327 233 53 34186 1632 {'x': -71.46089999999995, 'y': 43.0458000000001} 4 False
1791 1792 CONCORD TARGET 404478085 45211101 45133 531102 D NH DAMANTE DR ... 269 68 38 70 30 8776 751 {'x': -71.48459999999992, 'y': 43.220600000000... 0 False
1792 1793 WOBURN TARGET 404472244 45211101 60995 531102 D MA COMMERCE WAY ... 3698 3804 3700 42 385 42909 2653 {'x': -71.1362999999999, 'y': 42.51560000000017} 2 False
1793 1794 SAUGUS TARGET 434099121 45211101 32103 531102 D MA LYNN FELLS PKWY ... 5251 4116 4249 148 817 58103 3815 {'x': -71.02409999999995, 'y': 42.48030000000009} 1 False
1794 1795 WILMINGTON TARGET 706932486 45211101 32103 531102 D MA BALLARDVALE ST ... 2125 2155 3063 28 633 28077 1358 {'x': -71.15959999999995, 'y': 42.590600000000... 4 False
1795 1796 SALEM TARGET 232311613 45211101 32103 531102 D MA HIGHLAND AVE ... 2939 979 533 47 94 27473 2102 {'x': -70.91999999999993, 'y': 42.50370000000005} 0 False
1796 1797 DANVERS TARGET 572661924 45211101 21402 531102 D MA INDEPENDENCE WAY ... 3352 1955 1519 57 140 32110 3059 {'x': -70.93589999999995, 'y': 42.553900000000... 4 False
1797 1798 SALEM TARGET 404476440 45211101 45359 531102 D NH S BROADWAY ... 2396 1068 747 280 95 25617 1460 {'x': -71.21489999999994, 'y': 42.76440000000014} 4 False
1798 1799 METHUEN TARGET 653372177 45211101 64206 531102 D MA PLEASANT VALLEY ST ... 2810 1038 835 276 63 37046 2428 {'x': -71.16079999999991, 'y': 42.74080000000003} 4 False
1799 1800 HAVERHILL TARGET 647061712 45211101 32103 531102 D MA COMPUTER DR ... 3228 1164 792 208 38 44862 2623 {'x': -71.11899999999999, 'y': 42.78640000000011} 4 False
1800 1801 GREENLAND TARGET 716991245 45211101 33850 531102 D NH GREENLAND RD ... 1259 855 691 110 305 17278 792 {'x': -70.81869999999998, 'y': 43.04930000000009} 0 False
1801 1802 SOMERSWORTH TARGET 404478087 45211101 67700 531102 D NH ANDREWS RD ... 750 403 68 171 52 22790 1053 {'x': -70.88269999999991, 'y': 43.234700000000... 0 False
1802 1803 PLATTSBURGH TARGET 719443137 45211101 33924 531102 D NY SMITHFIELD BLVD ... 116 60 95 552 23 15530 916 {'x': -73.49119999999994, 'y': 44.70240000000007} 0 False
1803 1804 BIDDEFORD TARGET 402052516 45211101 32229 531102 D ME MARINER WAY ... 240 53 41 28 19 5589 244 {'x': -70.5133999999999, 'y': 43.473100000000116} 0 False
1804 1805 SOUTH PORTLAND TARGET 404465444 45211101 55864 531102 D ME RUNNING HILL RD ... 605 450 189 96 22 14518 719 {'x': -70.34589999999994, 'y': 43.63580000000009} 0 False
1805 1806 TOPSHAM TARGET 404459522 45211101 32229 531102 D ME TOPSHAM FAIR MALL RD ... 252 124 56 63 60 8951 619 {'x': -69.98529999999995, 'y': 43.93390000000004} 0 False
1806 1807 AUGUSTA TARGET 404472262 45211101 32229 531102 D ME CROSSING WAY ... 45 34 32 276 21 8544 685 {'x': -69.80609999999996, 'y': 44.30930000000007} 0 False
1807 1808 BANGOR TARGET 404476359 45211101 32229 531102 D ME LONGVIEW DR ... 133 79 52 204 15 9242 604 {'x': -68.74109999999993, 'y': 44.84110000000006} 0 False

1808 rows × 163 columns


In [24]:
df.groupby("segment_3").CITY.value_counts()


Out[24]:
segment_3  CITY           
False      CHICAGO            14
           SAN ANTONIO        14
           HOUSTON            13
           LAS VEGAS          10
           AUSTIN              9
           CHARLOTTE           8
           COLUMBUS            8
           JACKSONVILLE        8
           KANSAS CITY         8
           MINNEAPOLIS         8
           PHOENIX             8
           SAN JOSE            8
           ST PAUL             8
           ATLANTA             7
           LOUISVILLE          7
           MIAMI               7
           PITTSBURGH          7
           SAN DIEGO           7
           SPRINGFIELD         7
           TUCSON              7
           AURORA              6
           CINCINNATI          6
           MILWAUKEE           6
           OMAHA               6
           ORLANDO             6
           PHILADELPHIA        6
           ROCHESTER           6
           SACRAMENTO          6
           TAMPA               6
           ALBUQUERQUE         5
                              ..
True       CHANDLER            1
           CHICAGO             1
           COMMACK             1
           CUPERTINO           1
           DALY CITY           1
           ENCINITAS           1
           FALLS CHURCH        1
           GLENVIEW            1
           GRANADA HILLS       1
           HACKENSACK          1
           HICKSVILLE          1
           HIGHLANDS RANCH     1
           LONG BEACH          1
           MANHATTAN BEACH     1
           MOUNTAIN VIEW       1
           NILES               1
           PARAMUS             1
           PEORIA              1
           PHOENIX             1
           POWAY               1
           ROCKVILLE           1
           SAN DIMAS           1
           SEAL BEACH          1
           ST LOUIS            1
           ST LOUIS PARK       1
           SUNNYVALE           1
           VAUXHALL            1
           WALNUT CREEK        1
           WILLOWBROOK         1
           WOODRIDGE           1
Name: CITY, Length: 1230, dtype: int64

Map Results

After doing this analysis, we are going to begin by showing the locations of the top performing stores on the map relative to the rest of the stores.


In [17]:
top_segment = df_summary.iloc[0].segment
top_segment


Out[17]:
3.0

This is a first stab at mapping the data just using the draw method, which plots the data on the map much like just using a sharpie. The data is not organized into a layer per say, so it is not very useful. Still, it is interesting at a coursory level, just seeing the rest of the stores, and the top performers.


In [26]:
store_map = gis.map('Meades Ranch, KS', 4)
store_map.basemap = 'gray'
store_map



In [19]:
for index, row in df[df.segment != top_segment].iterrows():
    store_map.draw(row.SHAPE, symbol={'type': 'esriSMS', 'style': 'esriSMSCircle', 'size': 4})
for index, row in df[df.segment == top_segment].iterrows():
    store_map.draw(row.SHAPE, symbol={'type': 'esriSMS', 'style': 'esriSMSCircle', "color":[0,255,0,255], "size":6})

PCA


In [18]:
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

In [41]:
pca = PCA(n_components=2)

In [50]:
df['x'] = pca.fit_transform(df[fields_independent])[:,0]
df['y'] = pca.fit_transform(df[fields_independent])[:,1]

In [61]:
ax = plt.scatter(data=df, x='x', y='y', c='segment')

cluster_centers = pca.transform(cluster.cluster_centers_)
cluster_centers = pd.DataFrame(cluster_centers, columns=['x', 'y'])
cluster_centers['segment'] = range(0, len(cluster_centers))

plt.plot(data=cluster_centers)

plt.show()


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-61-6b3337d6cc3b> in <module>()
      1 ax = plt.scatter(data=df, x='x', y='y', c='segment')
      2 
----> 3 cluster_centers = pca.transform(cluster.cluster_centers_)
      4 cluster_centers = pd.DataFrame(cluster_centers, columns=['x', 'y'])
      5 cluster_centers['segment'] = range(0, len(cluster_centers))

NameError: name 'cluster' is not defined

In [ ]: